-
Notifications
You must be signed in to change notification settings - Fork 827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added post_install in setup.py file #3883
base: main
Are you sure you want to change the base?
Conversation
- **Correctly patch pdfminer to avoid PDF repair**. The patch applied to pdfminer's parser caused it to occasionally split tokens in content streams, throwing `PDFSyntaxError`. Repairing these PDFs sometimes failed (since they were not actually invalid) resulting in unnecessary OCR fallback. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra newline
@@ -32,6 +27,30 @@ def check_for_nltk_package(package_name: str, package_category: str) -> bool: | |||
return False | |||
|
|||
|
|||
# We cache this because we do not want to attempt | |||
# downloading the packages multiple times | |||
@lru_cache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the logic below we don't need cache. Further more (this is kind of pathological edge case but...) caching means we could ignore situations where between checks the nltk data is removed.
Add auto-download for NLTK for Python Enviroment When user install python library without image, It will automatically download nltk data.
Added
entry_points
insetup
insetup.py